NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Detecting structured signals in Ising models

https://doi.org/10.1214/23-AAP1929

Deb, Nabarun; Mukherjee, Rajarshi; Mukherjee, Sumit; Yuan, Ming (February 2024, The Annals of Applied Probability)

Full Text Available
Mean Field Approximations via Log-Concavity

https://doi.org/10.1093/imrn/rnad302

Lacker, Daniel; Mukherjee, Sumit; Yeung, Lane Chun (December 2023, International Mathematics Research Notices)

Abstract We propose a new approach to deriving quantitative mean field approximations for any probability measure $$P$$ on $$\mathbb {R}^{n}$$ with density proportional to $$e^{f(x)}$$, for $$f$$ strongly concave. We bound the mean field approximation for the log partition function $$\log \int e^{f(x)}dx$$ in terms of $$\sum _{i \neq j}\mathbb {E}_{Q^{*}}|\partial _{ij}f|^{2}$$, for a semi-explicit probability measure $$Q^{*}$$ characterized as the unique mean field optimizer, or equivalently as the minimizer of the relative entropy $$H(\cdot \,|\,P)$$ over product measures. This notably does not involve metric-entropy or gradient-complexity concepts which are common in prior work on nonlinear large deviations. Three implications are discussed, in the contexts of continuous Gibbs measures on large graphs, high-dimensional Bayesian linear regression, and the construction of decentralized near-optimizers in high-dimensional stochastic control problems. Our arguments are based primarily on functional inequalities and the notion of displacement convexity from optimal transport.
more » « less
Full Text Available
Motif estimation via subgraph sampling: The fourth-moment phenomenon

https://doi.org/10.1214/21-AOS2134

Bhattacharya, Bhaswar B.; Das, Sayan; Mukherjee, Sumit (April 2022, The Annals of Statistics)

Full Text Available
Joint estimation of parameters in Ising model

https://doi.org/10.1214/19-AOS1822

Ghosal, Promit; Mukherjee, Sumit (April 2020, The Annals of Statistics)
null (Ed.)
Full Text Available
Becoming Good at AI for Good

https://doi.org/10.1145/3461702.3462599

Kshirsagar, Meghana; Robinson, Caleb; Yang, Siyu; Gholami, Shahrzad; Klyuzhin, Ivan; Mukherjee, Sumit; Nasir, Md; Ortiz, Anthony; Oviedo, Felipe; Tanner, Darren; et al (July 2021, Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’21), May 19–21, 2021,)
null (Ed.)
AI for good (AI4G) projects involve developing and applying ar- tificial intelligence (AI) based solutions to further goals in areas such as sustainability, health, humanitarian aid, and social justice. Developing and deploying such solutions must be done in collab- oration with partners who are experts in the domain in question and who already have experience in making progress towards such goals. Based on our experiences, we detail the different aspects of this type of collaboration broken down into four high-level cat- egories: communication, data, modeling, and impact, and distill eleven takeaways to guide such projects in the future. We briefly describe two case studies to illustrate how some of these takeaways were applied in practice during our past collaborations.
more » « less
Full Text Available
Global testing against sparse alternatives under Ising models

https://doi.org/10.1214/17-AOS1612

Mukherjee, Rajarshi; Mukherjee, Sumit; Yuan, Ming (October 2018, The Annals of Statistics)
null (Ed.)
Full Text Available
Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge

https://doi.org/10.1093/bioinformatics/bty293

Mukherjee, Sumit; Zhang, Yue; Fan, Joshua; Seelig, Georg; Kannan, Sreeram (June 2018, Bioinformatics)

Abstract MotivationSingle cell RNA-seq (scRNA-seq) data contains a wealth of information which has to be inferred computationally from the observed sequencing reads. As the ability to sequence more cells improves rapidly, existing computational tools suffer from three problems. (i) The decreased reads-per-cell implies a highly sparse sample of the true cellular transcriptome. (ii) Many tools simply cannot handle the size of the resulting datasets. (iii) Prior biological knowledge such as bulk RNA-seq information of certain cell types or qualitative marker information is not taken into account. Here we present UNCURL, a preprocessing framework based on non-negative matrix factorization for scRNA-seq data, that is able to handle varying sampling distributions, scales to very large cell numbers and can incorporate prior knowledge. ResultsWe find that preprocessing using UNCURL consistently improves performance of commonly used scRNA-seq tools for clustering, visualization and lineage estimation, both in the absence and presence of prior knowledge. Finally we demonstrate that UNCURL is extremely scalable and parallelizable, and runs faster than other methods on a scRNA-seq dataset containing 1.3 million cells. Availability and implementationSource code is available at https://github.com/yjzhang/uncurl_python. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less

Search for: All records